Analyzing Open-Ended Responses in R

A Tutorial Using Real Teaching Evaluation Data

Introduction

This tutorial walks through techniques for analyzing open-ended text responses using R. We’ll use real data from my teaching evaluations spanning 2017–2025, covering courses in Consumer Behavior, Personal Selling & Sales Management, and Marketing Research.

NoteWhy Use This Data?

Open-ended survey responses are everywhere in marketing research — customer feedback, product reviews, social media comments, focus group transcripts. The techniques you’ll learn here apply to all of these contexts. Using teaching evaluations is just a convenient (and somewhat vulnerable!) way to demonstrate with real data.

By the end of this tutorial, you’ll be able to:

  1. Load and clean text data
  2. Tokenize text and remove stop words
  3. Calculate word frequencies
  4. Create word clouds
  5. Compare text across groups (courses, modalities)
  6. Perform basic sentiment analysis

Setup

First, let’s load the packages we’ll need:

# Install packages if needed (uncomment and run once)
# install.packages(c("tidyverse", "tidytext", "wordcloud", "wordcloud2", 
#                    "reshape2", "textdata", "scales"))

library(tidyverse)
library(tidytext)
library(wordcloud)
library(wordcloud2)
library(scales)

Part 1: Loading and Exploring the Data

Load the data

# Load the teaching evaluation data
evals <- read_csv("teaching_evaluations.csv")

# Quick peek
glimpse(evals)

Explore the structure

# How many comments per course?
evals %>%
  count(course_name, sort = TRUE)

# How many by modality?
evals %>%
  count(modality)

# How many by semester?
evals %>%
  count(semester, sort = TRUE)

Preview some comments

# Look at a few random comments
evals %>%
  sample_n(5) %>%
  select(course_name, modality, comment_text)

Part 2: Text Cleaning and Tokenization

Text analysis requires breaking text into individual units (usually words). This process is called tokenization. Before we tokenize, we often need to clean the text.

Basic tokenization with tidytext

The tidytext package makes this straightforward with unnest_tokens():

# Tokenize: one row per word
words <- evals %>%
  unnest_tokens(word, comment_text)

# How many total words?
nrow(words)

# Preview
head(words, 20)

Removing stop words

Stop words are common words that don’t carry much meaning (the, a, is, and, etc.). We typically remove these:

# tidytext includes a built-in stop word list
data("stop_words")
head(stop_words, 20)

# Remove stop words using anti_join
words_clean <- words %>%
  anti_join(stop_words, by = "word")

# How many words remain?
nrow(words_clean)

Custom stop words

Sometimes you’ll want to remove additional words that are common in your specific context but don’t add meaning:

# Create custom stop words for teaching evaluations
custom_stops <- tibble(word = c("class", "course", "professor", "carter", 
                                 "erin", "dr", "semester", "i'm", "i've",
                                 "it's", "don't", "didn't", "wasn't"))

# Remove both standard and custom stop words
words_clean <- words %>%
  anti_join(stop_words, by = "word") %>%
  anti_join(custom_stops, by = "word") %>%
  filter(str_detect(word, "^[a-z]+$"))  # Keep only alphabetic words

nrow(words_clean)

Part 3: Word Frequencies

Now let’s see what words appear most often.

Overall word frequency

# Count word frequencies
word_counts <- words_clean %>%
  count(word, sort = TRUE)

# Top 20 words
word_counts %>%
  head(20)

Visualize top words

word_counts %>%
  head(20) %>%
  mutate(word = fct_reorder(word, n)) %>%
  ggplot(aes(x = n, y = word)) +
  geom_col(fill = "#2D6A4F") +
  labs(
    title = "Most Frequent Words in Teaching Evaluations",
    x = "Frequency",
    y = NULL
  ) +
  theme_minimal(base_size = 13)

Word frequency by course

# Count words within each course
word_counts_by_course <- words_clean %>%
  count(course_name, word, sort = TRUE)

# Top 10 words per course
word_counts_by_course %>%
  group_by(course_name) %>%
  slice_max(n, n = 10) %>%
  ungroup() %>%
  mutate(word = reorder_within(word, n, course_name)) %>%
  ggplot(aes(x = n, y = word, fill = course_name)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~course_name, scales = "free_y") +
  scale_y_reordered() +
  scale_fill_manual(values = c("#2D6A4F", "#52796F", "#C85A3B")) +
  labs(
    title = "Top Words by Course",
    x = "Frequency",
    y = NULL
  ) +
  theme_minimal(base_size = 11)

Part 4: Word Clouds

Word clouds are a popular (if sometimes maligned) way to visualize text data. They’re great for quick impressions and presentations.

Basic word cloud

# Using the wordcloud package
set.seed(42)  # For reproducibility

word_counts %>%
  with(wordcloud(
    words = word,
    freq = n,
    max.words = 100,
    random.order = FALSE,
    colors = brewer.pal(8, "Dark2")
  ))

Interactive word cloud with wordcloud2

# wordcloud2 creates an interactive HTML widget
wordcloud2(
  data = word_counts %>% head(100),
  size = 0.8,
  color = "random-dark",
  backgroundColor = "#FAF8F5"
)

Word clouds by group

Let’s create separate word clouds for in-person vs. online courses:

# Get word counts by modality
inperson_words <- words_clean %>%
  filter(modality == "in-person") %>%
  count(word, sort = TRUE)

online_words <- words_clean %>%
  filter(modality == "online") %>%
  count(word, sort = TRUE)

# In-person word cloud
par(mfrow = c(1, 2))  # Two plots side by side

wordcloud(
  words = inperson_words$word,
  freq = inperson_words$n,
  max.words = 75,
  random.order = FALSE,
  colors = brewer.pal(8, "Greens")[4:8],
  main = "In-Person"
)

wordcloud(
  words = online_words$word,
  freq = online_words$n,
  max.words = 75,
  random.order = FALSE,
  colors = brewer.pal(8, "Blues")[4:8],
  main = "Online"
)

par(mfrow = c(1, 1))  # Reset

Comparison cloud

A comparison cloud shows which words are distinctive to each group:

# Reshape for comparison cloud
comparison_data <- words_clean %>%
  count(modality, word) %>%
  pivot_wider(names_from = modality, values_from = n, values_fill = 0) %>%
  column_to_rownames("word") %>%
  as.matrix()

comparison.cloud(
  comparison_data,
  max.words = 100,
  colors = c("#2D6A4F", "#C85A3B"),
  title.size = 1.5
)

Part 5: Comparing Groups with TF-IDF

Word frequency alone can be misleading — common words will dominate across all groups. TF-IDF (Term Frequency-Inverse Document Frequency) helps identify words that are distinctive to each group.

# Calculate TF-IDF by course
course_tfidf <- words_clean %>%
  count(course_name, word, sort = TRUE) %>%
  bind_tf_idf(word, course_name, n)

# Top distinctive words per course
course_tfidf %>%
  group_by(course_name) %>%
  slice_max(tf_idf, n = 10) %>%
  ungroup() %>%
  mutate(word = reorder_within(word, tf_idf, course_name)) %>%
  ggplot(aes(x = tf_idf, y = word, fill = course_name)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~course_name, scales = "free_y") +
  scale_y_reordered() +
  scale_fill_manual(values = c("#2D6A4F", "#52796F", "#C85A3B")) +
  labs(
    title = "Most Distinctive Words by Course (TF-IDF)",
    subtitle = "Words that are uniquely common in each course",
    x = "TF-IDF",
    y = NULL
  ) +
  theme_minimal(base_size = 11)

Part 6: Sentiment Analysis

Sentiment analysis attempts to determine the emotional tone of text. We’ll use a lexicon-based approach, where words are matched against a dictionary of words labeled with sentiments.

Load a sentiment lexicon

# The "bing" lexicon classifies words as positive or negative
# You may be prompted to download it the first time
bing <- get_sentiments("bing")

head(bing, 20)

Calculate sentiment

# Join words with sentiment lexicon
word_sentiments <- words_clean %>%
  inner_join(bing, by = "word")

# Count positive vs negative words
word_sentiments %>%
  count(sentiment)

Sentiment by course

# Sentiment breakdown by course
sentiment_by_course <- word_sentiments %>%
  count(course_name, sentiment) %>%
  pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>%
  mutate(
    total = positive + negative,
    pct_positive = positive / total
  )

sentiment_by_course

# Visualize
sentiment_by_course %>%
  pivot_longer(cols = c(positive, negative), names_to = "sentiment", values_to = "n") %>%
  ggplot(aes(x = course_name, y = n, fill = sentiment)) +
  geom_col(position = "fill") +
  scale_fill_manual(values = c("negative" = "#C85A3B", "positive" = "#2D6A4F")) +
  scale_y_continuous(labels = percent) +
  labs(
    title = "Sentiment Distribution by Course",
    x = NULL,
    y = "Percentage of sentiment-bearing words"
  ) +
  theme_minimal(base_size = 13) +
  coord_flip()

Sentiment by modality

# Compare in-person vs online sentiment
sentiment_by_modality <- word_sentiments %>%
  count(modality, sentiment) %>%
  pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>%
  mutate(
    total = positive + negative,
    pct_positive = positive / total
  )

sentiment_by_modality

Sentiment over time

# Extract year from semester
word_sentiments_time <- word_sentiments %>%
  mutate(year = str_extract(semester, "\\d{4}") %>% as.numeric())

# Sentiment by year
sentiment_by_year <- word_sentiments_time %>%
  count(year, sentiment) %>%
  pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>%
  mutate(
    total = positive + negative,
    pct_positive = positive / total
  )

ggplot(sentiment_by_year, aes(x = year, y = pct_positive)) +
  geom_line(color = "#2D6A4F", size = 1) +
  geom_point(color = "#2D6A4F", size = 3) +
  scale_y_continuous(labels = percent, limits = c(0.5, 1)) +
  labs(
    title = "Sentiment Over Time",
    subtitle = "Percentage of positive sentiment-bearing words",
    x = "Year",
    y = "% Positive"
  ) +
  theme_minimal(base_size = 13)

Most common positive and negative words

# Top positive words
word_sentiments %>%
  filter(sentiment == "positive") %>%
  count(word, sort = TRUE) %>%
  head(15) %>%
  mutate(word = fct_reorder(word, n)) %>%
  ggplot(aes(x = n, y = word)) +
  geom_col(fill = "#2D6A4F") +
  labs(title = "Most Common Positive Words", x = "Frequency", y = NULL) +
  theme_minimal()

# Top negative words
word_sentiments %>%
  filter(sentiment == "negative") %>%
  count(word, sort = TRUE) %>%
  head(15) %>%
  mutate(word = fct_reorder(word, n)) %>%
  ggplot(aes(x = n, y = word)) +
  geom_col(fill = "#C85A3B") +
  labs(title = "Most Common Negative Words", x = "Frequency", y = NULL) +
  theme_minimal()

Part 7: Going Further

N-grams

Instead of single words, we can look at pairs (bigrams) or triplets (trigrams):

# Extract bigrams
bigrams <- evals %>%
  unnest_tokens(bigram, comment_text, token = "ngrams", n = 2) %>%
  filter(!is.na(bigram))

# Count bigrams
bigram_counts <- bigrams %>%
  count(bigram, sort = TRUE)

head(bigram_counts, 20)

Filter bigrams to remove stop words

# Separate, filter, and reunite
bigrams_filtered <- bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ") %>%
  filter(!word1 %in% stop_words$word,
         !word2 %in% stop_words$word,
         !word1 %in% custom_stops$word,
         !word2 %in% custom_stops$word) %>%
  unite(bigram, word1, word2, sep = " ")

bigrams_filtered %>%
  count(bigram, sort = TRUE) %>%
  head(20)

Alternative sentiment lexicons

The tidytext package provides access to several sentiment lexicons:

# AFINN: words scored from -5 (very negative) to +5 (very positive)
afinn <- get_sentiments("afinn")

# Calculate average sentiment score per course
words_clean %>%
  inner_join(afinn, by = "word") %>%
  group_by(course_name) %>%
  summarize(
    avg_sentiment = mean(value),
    n_words = n()
  )

Wrapping Up

You’ve now learned the fundamentals of text analysis in R:

  1. Tokenization: Breaking text into words
  2. Stop word removal: Filtering out common words
  3. Word frequencies: Counting and visualizing common terms
  4. Word clouds: Visual summaries of text
  5. TF-IDF: Finding distinctive words across groups
  6. Sentiment analysis: Measuring emotional tone

These techniques form the foundation for more advanced text analysis methods like topic modeling, named entity recognition, and text classification.

TipPractice Ideas
  • Try analyzing your own text data (reviews, social media posts, survey responses)
  • Experiment with different stop word lists
  • Compare sentiment lexicons (bing vs. AFINN vs. NRC)
  • Create word clouds for different subsets of the data

Resources